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Abstract 

This paper gives a brief overview on the nonparametric techniques that are useful for financial 
econometric problems. The problems include estimation and inferences of instantaneous returns 
and volatility functions of time-homogeneous and time-dependent diffusion processes, and esti- 
mation of transition densities and state price densities. We first briefly describe the problems 
and then outline main techniques and main results. Some useful probabilistic aspects of diffusion 
processes are also briefly summarized to facilitate our presentation and applications. 

1 Introduction 

Technological invention and trade globalization have brought into a new era of financial markets. 
Over the last three decades, enormous number of new financial products have been introduced to 
meet customers' demands. An important milestone is that in the year 1973, the world's first options 
exchange opened in Chicago. At the same year, Black and Scholes (1973) published their famous 
paper on option pricing and Merton (1973) launched general equilibrium model for security pricing, 
two landmarks for modern asset pricing. Since then, the derivative markets have experienced 
extraordinary growth. Professionals in finance now routinely use sophisticated statistical techniques 
and modern computation power in portfolio management, securities regulation, proprietary trading, 
financial consulting and risk management. 

Financial econometrics is an active field of integration of finance, economics, probability, statis- 
tics, and applied mathematics. This is exemplified in the books by Campbell et ai.(1997), Gourieroux 
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and Jasiak (2001), and Cochrane (2001). Financial activities generate many new problems, eco- 
nomics provides useful theoretical foundation and guidance, and quantitative methods such as 
statistics, probability and applied mathematics are essential tools to solve the quantitative prob- 
lems in finance. To name a few, complex financial products pose new challenges on their valuation 
and risk management. Sophisticated stochastic models have been introduced to capture the salient 
features of underlying economic variables and use for security pricing. Statistical tools are used 
to identify parameters of stochastic models, to simulate complex financial systems and to test 
economic theories via empirical financial data. 

An important area of financial econometrics is to study the expected returns and volatilities of 
the price dynamics of stocks and bonds. Returns and volatilities are directly related to asset pricing, 
proprietary trading, security regulation and portfolio management. To achieve these objectives, the 
stochastic dynamics of underlying state variables should be correctly specified. For example, option 
pricing theory allows one to value stock or index options and hedge against the risks of option 
writers, once a model for the dynamics of underlying state variables is given. See, for example, the 
books on mathematical finance by Bingham and Kiesel (1998), Steele (2000), and Duffie (2001). 
Yet, many of stochastic models in use are simple and convenient ones to facilitate mathematical 
derivations and statistical inferences. They are not derived from any economics theory and hence 
can not be expected to fit all financial data. Thus, while the pricing theory gives spectacularly 
beautiful formulas when the underlying dynamics is correctly specified, it offers little guidance in 
choosing or validating a model. There is always a danger that misspecification of a model leads to 
erroneous valuation and hedging strategies. Hence, there are genuine needs for flexible stochastic 
modeling. Nonparametric methods offer a unified and elegant treatment for such a purpose. 

Nonparametric approaches have recently been introduced to estimate return, volatility, transi- 
tion densities and state price densities of stock prices and bond yields (interest rates). They are 
also useful for examing the extent to which the dynamics of stock prices and bond yields vary over 
time. They have immediate applications to the valuation of bond price and stock options and man- 
agement of market risks. They can also be employed to test economic theory such as the capital 
asset pricing model and stochastic discount model (Campbell et al.1997) and answer the questions 
such as if the geometric Brownian motion fits certain stock indices, whether the Cox-Ingsoll-Ross 
model fits yields of bonds, and if interest rates dynamics evolve with time. Furthermore, based 
on empirical data, one can also fit directly the observed option prices with their associated char- 
acteristics such as strike price, the time to maturity, risk-free interest rate, dividend yield and see 
if the option prices are consistent with the theoretical ones. Needless to say, nonparametric tech- 
niques will play an increasingly important role in financial econometrics, thanks to the availability 
of modern computing power and the development of financial econometrics. 

The paper is organized as follows. We first introduce in section 2 some useful stochastic models 
for modeling stock prices and bond yields and then briefly outline some probabilistic aspects of 
the models. In section 3, we review nonparametric techniques used for estimating the drift and 
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diffusion functions, based on either discretely or continuously observed data. In section 4, we 
outline techniques for estimating state price densities and transition densities. Their applications 
in asset pricing and testing for parametric diffusion models are also introduced. Section 5 makes 
some concluding remarks. 

2 Stochastic diffusion models 

Much of financial econometrics concerns with asset pricing, portfolio choice and risk management. 
Stochastic diffusion models have been widely used for describing the dynamics of underlying eco- 
nomic variables and asset prices. They form the basis of many spectacularly beautiful formulas for 
pricing contingent claims. For an introduction to financial derivatives, see Hull (2003). 

2.1 One-factor diffusion models 

Let StA denote the stock price observed at time tA. The time unit can be hourly, daily, weekly, 
among others. Presented in Figure 1(a) is the daily log-returns, defined as 

log(S'iA) - logC^t-lJA) ~ (StA - 5'( t _i) A )/S'( t _ 1)A 

of the Standard and Poor 500 index, which is a value- weighted index based on the prices of the 500 
stocks that account for approximately 70% of the total U.S. equity (stock) market capitalization. 
The styled features of the returns include that the volatility tends to cluster and that the mean and 
variance of the returns tend to be constant. One simplified model to capture the second feature is 
that 

log(StA) - log(5 (t _ 1)A ) « fi + a e t , 

where {et} is a sequence of independent normal random variables. This is basically a random 
walk hypothesis, regarding the stock price movement as an independent random walk. When the 
sampling time unit A gets small, the above random walk can be regarded as a random sample from 
the continuous-time process: 

d\og{S t )=^ + a 1 dWt, (1) 

where {Wt} is a standard one-dimensional Brownian motion and o\ = o"o/\/A- The process (^Q) 
is called geometric Brownian motion, as St is an exponent of Browian motion Wt- It was used 
by Osborne (1959) to model stock price dynamic and by Black and Scholes (1973) to derive their 
celebrated option price formula. 

Interest rates are fundamental to financial markets, consumer spending, corporate earnings, 
asset pricing, inflation and economy. The bond market is even bigger than the equity market. 
Presented in Figure 1(c) is the interest rates {r{\ of the two-year US Treasury notes at weekly 
frequency. As the interest rates get higher, so do the volatilities. To appreciate this, Figure 1(d) 
plot the pairs {(r t ~i, r t — rt-i)}- Its dynamic is very different from the equity market. The interest 
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Daily returns of SP500 



log-price versus daily returns 




Figure 1: (a) Daily log-returns of the Standard & Poor 500 index from October 21, 1980 to July 29, 
2004. (b) Scatter plot of the returns against logarithm of the index (price level), (c) Interest rates 
of two-year US Treasury notes from June 4, 1976 to March 7, 2007 sampled at weakly frequency, 
(d) Scatter plot of the difference of yields versus the yields. 

rates should be non-negative. They possess heteroscedasticity in addition to the mean-revision 
property: As the interest rates rise above the mean level a, there is a negative drift that pulls the 
rates down, while when the interest rates fall below a, there is a positive force that drives the rate 
up. To capture these two main features, Cox et aJ.(1985) derived the following model for interest 
rate dynamic: 

dr t = n{a - r t )dt + ar] /2 dW t . (2) 

For simplicity, we will refer to it as the CIR model. It is an amelioration of the Vasicek (1977) 
model 

dr t = k{ol — r t )dt + adWt, (3) 

which ignores the heteroscedasticity and is also referred to as the Ornstein-Uhlenback process. 
While this is an unrealistic model for interest rates, the process is Gaussian with explicit transition 
density. It fact, the time series sampled from © follows the autoregressive model of order 1: 

Y t = (1 - p)a + pYt-x + e t , (4) 

where Y t = r t A, £ ~ N(0, a 2 (I— p 2 ) / (2k)) and p = exp(— kA). Hence, the process is well understood 
and usually serves as a testing case for proposed statistical methods. 

There are many stochastic models that have been introduced to model the dynamics of stocks 
and bonds. Let X± be an observed economic variable at time t. This can be the prices of a stock 
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or a stock index, or the yields of a bond. A simple and frequently used stochastic model is 

dX t = n{X t )dt + a(X t )dW t . (5) 

The function //(•) is frequently called a drift or instantaneous return function and cr(-) is referred 
to diffusion or volatility function, since 

fi(X t ) = Hm A-^pQ+A - A t |A t ), cr 2 (A t ) = Hm A- 1 var(X t+A |X 4 ). 

The time-homogeneous model © contains many famous one-factor models in financial econo- 
metrics. In an effort to improve the flexibility of modeling interest dynamics, Chan et aJ.(1992) 
extends the CIR model @ to 

dX t = k{cc - X t )dt + aXldWf (6) 

Ait-Sahalia (1996b) introduces nonlinear mean reversion: while interest rates remain in the middle 
part of their domain, there is little mean reversion and at the end of the domain, strong nonlinear 
mean reversion emerges. He imposes the nonlinear drift of form (a^Xf 1 + a± + a^A^ + a^A 2 ). See 
also Ahn and Gao (1999), which models the interest rates by Yt = Xf , in which the Xt follows 
the CIR model. 

Economic conditions vary over time. Thus, it is reasonable to expect that the instantaneous 
return and volatility depend on both time and price level for a given state variable, such as stock 
prices and bond yields. This leads to a further generalization of model © to allow the coefficients 
to depend on time t: 

dX t = M (t, X t )dt + a(t, X t )dW t . (7) 

Since only a trajectory of the process is observed [see Figure 1(c)], there is no sufficient information 
to estimate the bivariate functions in (0) without further restrictions. A useful specification of 
model (J7J) is 

dX t = {a (t) + ai (t)X t } dt + /3 (t)Af lW dW t . (8) 

This is an extension of the CKLS model (jHJ) by allowing the coefficients to depend on time and 
was introduced and studied by Fan et ai.(2003). Model © includes many commonly- used time- 
varying models for the yields of bonds, introduced by Ho and Lee (1986), Hull and White (1990), 
Black, Derman and Toy (1990), Black and Karasinski (1991), among others. The experience in 
Fan et al.(2003) and other studies of the varying coefficient models (Chen and Tsay 1993, Hastie 
and Tibshirani, 1993, Cai et al.2000) shows that coefficient functions in (JHJ) can not be estimated 
reliably due to the collinearity effect in local estimation: localizing in the time domain, the process 
{Xt} is nearly constant. This leads Fan et al(2003) to introduce the semiparametric model: 

dX t = {a (t) + aiX t } dt + 0o(t)Xl dW t . (9) 

to avoid the collinearity. 
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2.2 Some probabilistic aspects 

A question arises naturally when there exists a solution to the stochastic differential equation (SDE) 
(J7J). Such a program was first carried out by It6(1942, 1946). For SDE (JJJ), there are two different 
meanings of solutions: strong solution and weak solution. See sections 5.2 and 5.3 of Karatzas 
and Shreve (1991). Basically, for a given initial condition £, a strong solution requires that Xt 
is determined completely by the information up to time t. Under the Lipchitz and linear growth 
conditions on the drift and diffusion functions, for every £ that is independent of {W s }, there exists 
a strong solution of equation ((7J). Such a solution is unique. See Theorem 2.9 of Karatzas and 
Shreve (1991). 

For one-dimensional time- homogeneous diffusion process weaker conditions can be obtained 
for the so-called weaker solution. By an application of the Ito's formula to an appropriate transform 
of the process, one can make the drift to zero. Thus, we can consider without loss of generality 
that the drift in (JSJ) is zero. For such a model, Engelbert and Schmidt (1984) give a necessary and 
sufficient condition of the existence of the solution. The continuity of a suffices for the existence of 
the weak solution. See Theorem 5.5.4 (page 333) of Karatzas and Shreve (1991) and Theorem 23.1 
of Kallenberg (2001). 

We will again use several times the ltd formula. For process Xt in (JJJ), for a sufficiently regular 
function / ( Karatzas and Shreve, 1991, p. 153), 

,/(*,,^{^ + i^^, t )^ + ^*, (!0, 

The formula can be understood as the second order Taylor expansion of f(X t +A,t + A) — f(X t , t) 
by noticing that (Xt+A — X t ) 2 is approximately a 2 (X t ,t)A. 

The Markovian property plays an important role in statistical inference. According to Theorem 
5.4.20 of Karatzas and Shreve (1991), the solution Xt to equation © is Markovian, provided that 
the coefficient functions /x and a are bounded on compact subsets. Let va^v\x) be the transition 
density, the conditional density of Xt+A = V given Xt = x. The transition density must satisfy the 
forward and backward Kolmogorov equations (page 282, Karatzas and Shreve 1991). 

Under the linear growth and Lipchitz's conditions, and additional conditions on the boundary 
behavior of functions /i and a, the solution to equation Q is positive and ergodic. The invariant 
density is given by 

f{x) = 2C a- 2 (x) exp(2 J* fJ ,(y)a' 2 (y)dy), (11) 

where Co is a normalizing constant and the lower limit of the integral does not matter. If the 
initial distribution is taken from the invariant density, then the process {Xt} is stationary with the 
marginal density / and transition density pa- 
Let Ht be the operator defined by 

(H t g)(x) = E(g(X t )\X = x), x e R, (12) 
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where / is a Borel measurable bounded function on R. A stationary process Xt is said to satisfy 
the condition G^s, a) of Rosenblatt (1970) if there exists an s such that 

E(H s f)\X) 



H 



s || 2 



{f:E S f(X)=0} Ep{X) 



< a < 1, 



namely the operator is contractive. As a consequence of the semigroup {H s+ t = H s Ht) and con- 
traction properties, the condition G2 implies (Banon, 1977) that for any t € [0, 00), \\Ht\\ < a*/ s-1 . 
The latter implies, by the Cauchy-Schwartz inequality, that 



p(t) = sup corr(#i(Ao),ff 2 (Ai)) < a 

91,92 



t/s-1 



(13) 



That is, the p-mixing coefficient decays exponentially fast. Banon and Nguyen (1981) show further 
that for stationary Markov process, p(t) — > is equivalent to (|13j) . namely, p-mixing and geometric 
p-mixing are equivalent. 



2.3 Valuation of contingent claims 

An important application of SDE is the pricing of financial derivatives, such as options and bonds. 
It forms beautiful modern asset pricing theory and provides useful guidance in practice. Hull (2003) 
and Duffie (2001) offer very nice introduction to the field. 

The simplest financial derivative is the European call option. A call option is the right to buy an 
asset at a certain price K (strike price) before or at expiration time T. A put option gives the right 
to sell an asset at a certain price K (strike price) before or at expiration. European options allow 
option holders to exercise only at maturity, while American options can be exercised at any time 
before expiration. Most stock options are American, while options on stock indices are European. 



Payoff of a call option 



Payoff of a put option 



A portfolio of options 
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1300 1000 



1 1 00 1 200 

(b) 



1300 1000 
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Figure 2: (a) Payoff of a call option, (b) Payoff of a put option, (c) Payoff of a portfolio of 4 
options with different strike prices and different (long and short) positions 



The payoff for a European call option is {Xt — K) + , where Xt is the price of the stock at 
expiration T. When the stock raises above the strike price K, one can excise the right and makes a 
profit of Xt~K. However, when the stock falls below K, one renders his right and makes no profit. 
Similarly, a European put option has payoff (K — Xt)+- See Figure 2. By creating a portfolio 
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with different maturity and different strike prices, one can obtain all kind of payoff functions. As 
an example, suppose that a portfolio of options consists of contracts of SP500 index matured in 6 
months: one call-option with strike price $1,200, one put-option with strike price $1,050, and $40 
cash, but short position (borrowing or —1 contract) on a call option with strike price $1,150 and 
a one put option with strike $1,100. Figure 2(c) shows the payoff function of such a portfolio of 
options at the expiration T. Clearly, such an investor bets the S&P 500 index should be around 
$1,125 in 6 months and limits his risks on the investment. Thus, the European call and put options 
are fundamental options as far as the payoff function at time T is concerned. There are many other 
exotic options such as Asian options, look-back options and barrier options, which have different 
payoff functions and the functions can be path dependent. See Chapter 18 of Hull (2003). 

Suppose that the asset price follows the SDE (JJJ) and there is a riskless investment alternative 
such as bond which earns compounding rate of interest rt- Suppose that the underlying asset pays 
no dividend. Let /3t be the value of the riskless bond at time t. Then, with initial investment /3q, 



thanks to the compounding of interests. Suppose that the probability measure Q is equivalent to 
the original probability measure P, namely P(A) = if and only if Q(A) = 0. The measure Q 
is called an equivalent martingale measure for deflated price processes of given securities if these 
processes are martingales with respect to Q. An equivalent martingale measure is also referred to 
as a "risk-neutral" measure if the deflater is the bond price process. See Chapter 6 of Duffle (2001). 

When the markets are dynamically complete, the price of the European option with payoff 
^(Xt) with initial price Xq = xq is 



where Q is the equivalent martingale measure for the deflated price process Xt/Pt- Namely, it is 
the discounted value of the expected payoff in the risk neutral world. The formula is derived by 
using the so-called relative pricing approach, which values the price of the option from given prices 
of a portfolio, consisting of the risk-free bond and the stock, with the identical payoff as the option 
at the expiration. 

As an illustrative example, suppose that the price of a stock follows the geometric Brownian 
motion dXt = [iXtdt + aXfdWt and that the risk-free rate r is constant. Then, the deflated price 
process Yj = exp(— rt)Xt follows the SDE: 



The deflated price process is not a martingale as the drift is not zero. The risk-neutral measure 
is the one that makes the drift zero. To achieve this, we appeal to the Girsanov theorem, which 
changes the drift of a diffusion process without alternating the diffusion, via a change of probability 





(14) 



dY t = ( M - r)Y t dt + aY t dW t . 
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measure. Under the "risk-neutral" probability measure Q, the process Yt satisfies dYt = aYfdWt, a 
martingale. Hence, the price process Xt = exp(rt)Yf under Q follows 

dX t = rX t dt + aX t dW t . (15) 

Using exactly the same derivation, one can easily generalize the result to the price process (0). 
Under the risk-neutral measure, the price process © follows 

dX t = rX t dt + a(X t )dW t . 

The intuitive explanation of this is clear: all stocks under the "risk- neutral" world is expected to 
earn the same rate as the risk- free bond. 

For the Geometric Brownian motion, by an application of the ltd formula 1)10(1 to (|15l) . we have 
under the "risk-neutral" measure 

logX t -logX = (r-a 2 /2)t + a 2 W t . (16) 

Note that given the initial price Xq, the price follows a log- normal distribution. Evaluation of the 
expectation of (|14|) for the European call option with payoff *S>(Xt) = (Xt — K) + , one obtains the 
Black-Scholes (1973) option pricing formula: 

P = x Hd 1 )-KeM-rT)<S>(d 2 ), (17) 
where di = {log{x /K) + (r + a 2 /2)T}{aVT}- 1 and d 2 = d 1 - aVf. 

2.4 Simulation of stochastic models 

Simulation methods provide useful tools for valuation of financial derivatives and other financial 
instruments, when analytic formula (|14|) is hard to obtain. They also provide useful tools for 
assessing performance of statistical methods and statistical inferences. 

The simplest method is perhaps the Euler scheme. The SDE (JJJ) is approximated as 

X t+ A =X t + fi(t, X t )A + a(t, X t )A l ' 2 e u (18) 

where {et} is a sequence of independent random variables with the standard normal distribution. 
The time unit is usually a year. Thus, the monthly, weekly and daily data correspond, respectively, 
to A = 1/12, 1/52 and 1/252 (there are approximately 252 trading days per year). Given an initial 
value, one can recursively apply ((TH|) to obtain a sequence of simulated data {XjA, j = 1,2, • • •}. 
The approximation error can be reduced if one uses a smaller step size A/M for a given integer M 
to obtain first a more detailed sequence {X/a/M; J = 1, 2, • • •} and then to take the subsequence 
{XjA,j = 1 5 2, • • •}. For example, to simulate daily prices of a stock, one can simulate hourly data 
first and than takes the daily closing prices. Since the step size A/M is smaller, the approximation 
(|18l) is more accurate. However, the computational cost is about a factor of M higher. 
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The Euler scheme has convergence rate A 1//2 , which is called strong order 0.5 approximation by 
Kloeden et ai.(1996). The higher order approximations can be obtained by the Ito- Taylor expansion 
(see Schurz, 2000, page 242). In particular, a strong order-one approximation is given by 

X t+A =X t + fiit, X t )A + a(t, X^A^et + ±*(t, X t )a' x (t, X t )A{e 2 t - 1}, (19) 

where a' x (t,x) is the partial derivative function with respect to x. This method can be combined 
with a smaller step size method in the last paragraph. For the time-homogeneous model Q, 
an alternative form, without evaluating the derivative function, is given in (3.14) of Kloeden et 
al(1996). 

The exact simulation method is available if one can simulate the data from the transition density. 
Given the current value Xt = xq, one draws Xt+A from the transition density pa('\xq). The initial 
condition can either be fixed at a given value or be generated from the invariant density 1)11(1 . In 
the latter case, the generated sequence is stationary. 

There are only a few processes where exact simulation is possible. For GBM, one can generate 
the sequence from the explicit solution (|16|) . where the Brownian motion can be simulated from 
independent Gaussian increments. The conditional density of Vascicek's model © is Gaussian with 
mean a + (xq — a)p and variance a\ = a 2 (l — p 2 )/(2k), as indicated by (JIJ). Generate Xq from the 
invariant density N(a, a 2 /(2k)). With Xq, generate Xa from the normal distribution with mean 
a + (Xq — a) exp(— kA) and variance a\. With Xa, we generate X2A from a + (Xa — a) exp(— nA) 
and variance a\. Repeat this process until we obtain the desired length of the process. 

For the CIR model @, provided that q = 2na/a 2 — 1 > (a sufficient condition for Xf > 0), 
the transition density can be expressed in terms of the modified Bessel function of the first kind. 
This distribution is often referred to as the noncentral x 2 distribution. That is, given Xt = xq, 
2cXt+A has a noncentral x 2 distribution with degrees of freedom 2q+2 and noncentrality parameter 
2u. The invariant density is the Gamma distribution with shape parameter q + 1 and the scale 
parameter a 2 /(2k). 

As an illustration, we consider the CIR model (J7J) with parameters k = 0.21459, a = 0.08571, 
a = 0.07830 and A = 1/12. The model parameters are taken from Chapman and Pearson (2000). 
We simulated 1000 monthly data using both the Euler scheme (|18|) and strong order-one approxi- 
mation (|19|) with the same random shocks. Figure 3 depicts one of their trajectories. The difference 
is negligible. This is in line with the observations made by Stanton (1997) that as long as data 
are sampled monthly or more frequently, the errors introduced by using the Euler approximation 
is very small for stochastic dynamics that are similar to the CIR model. 

3 Estimation of return and volatility functions 

There are a large literature on the estimation of the return and volatility functions. Early references 
include Pham (1981) and Prakasa Rao (1985). Some studies are based on continuously observed 
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Euler versus order 1 scheme, monthly 
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Figure 3: Simulated trajectories (multiplied by 100) using the Euler approximation and the strong 
order-one approximation for a CIR model. Top panel: solid-curve corresponds to the Euler ap- 
proximation and the dashed curve is based on the order-one approximation. Botton panel: The 
difference between the order-one scheme and the Euler scheme. 

data, while others are based on discretely observed data. For the latter, some regard A tending to 
zero, while others regard A fixed. We briefly introduce some of the ideas. 

3.1 Methods of estimation 

We first outline several methods of estimation for parametric models. The idea can be expanded 
into nonparametric models. Suppose that we have a sample {Xi^,i = 0, • • • ,n} from model 
Then, the likelihood function, under the stationary condition, is 

n 

log/(X )+X>gp A (XiA|X (i _ 1)A ). (20) 
i=l 

If the functions fi and a are parameterized and the explicit form of the transition density is available, 
one can apply the maximum likelihood method. However, the explicit form of the transition density 
is not available for many simple models such as the CLKS model ©• Even for the CIR model (J2J), 
its maximum likelihood estimator is very difficult to find. 

One simple technique is to rely on the Euler approximation scheme 1)18(1 . Then proceed as if the 
data come from the Gaussian location and scale model. This method works well when A is small, 
but can create some biases when A is large. However, the bias can be reduced by the following 
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Illustration of indirect inferences 
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Figure 4: Illustration the idea of the indirect inference. For each given true 9, one obtains an 
estimate using the Euler approximation. This gives a calibration curve as shown. Now for a given 
estimate 9q = 3 based on the Euler approximation, one find the calibrated estimate ^ 1 " 1 (3) = 2.080. 

calibration idea, called the indirect inference by Gourieroux et al.(1993). The idea works as follows. 
Suppose that the functions (i and a have been parameterized with unknown parameters 8. Use 
the Euler approximation Q18|) and the maximum likelihood method to obtain an estimate 9q. For 
each given parameter 8 around 9q , simulate data from (J5J) and apply the crude method to obtain 
an estimate 9i(9), which depends on 9. Since we simulated the data with the true parameter 9, 
the function 9\{9) tells us how to calibrate the estimate. See Figure 4. Calibrate the estimate via 
6^ 1 (9o), which improves the bias of estimate. One drawback of this method is that it is intensive 
in computation and the calibration can not easily be done when the dimensionality of parameters 
9 is high. 

Another method for bias reduction is to approximate the transition density in (|20|) by a higher 
order approximation, and to then maximize the approximated likelihood function. Such a scheme 
has been introduced by Ait-Sahalia (1999, 2002) who derives the expansion of the transition density 
around a normal density function using the Hermit polynomial. The intuition behind such an 
expansion is that the diffusion process X t +A— X t in © can be regarded as sum of many independent 
increments with a smaller step size and hence the Edgeworth expansion can be obtained for the 
distribution of Xt+A — Xt given Xt. 

An "exact" approach is to use the method of moment. If the process Xt is stationary as in the 
interest-rate models, the moment conditions can easily be derived by observing 

£{Um A-^[ 5 (X t+A ) - g(X t )\X t ]} = km A^^X^a) - g(X t )] = 

for any function g satisfying the regularity condition such that the limit and the expectation is 
exchangeable. The right-hand side is the expectation of dg(Xt). By Ito's formula (jlUj) . the above 
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equation deduces to 

Eig'iXt^Xt) + g"(X t )a 2 (X t )/2] = 0. (21) 
For example, if g(x) = exp(-ax) for some given a > 0, then 

Eexp(-aX t ){f,(X t ) - aa 2 (X t )/2} = 0. 

This can produce arbitrary number of equations by choosing different a's. If the functions \i and 
a are parametrized, the number of moment conditions can be more than the number of equations. 
One way to efficiently use this is the generalized method of moment introduced by Hansen (1982), 
minimizing a quadratic form of the discrepancies between the empirical and the theoretical mo- 
ments, a generalization of the classical method of moment which solves the moment equations. The 
weighting matrix in the quadratic form can be chosen to optimize the performance of the resulting 
estimator. To improve the efficiency of the estimate, a large system of moments is needed. Thus, 
the generalized method of moments needs a large system of nonlinear equations, which can be ex- 
pensive in computation. Further, the moment equations (|21|) use only the marginal information of 
the process. Hence, it is not efficient. For example, in the CKLS model ©, o and k are estimable 
via ()21j) only through a 2 / k. 



3.2 Time-homogeneous model 

The Euler approximation can easily be used to estimate the drift and diffusion nonparametrically. 
Let Y iA = A _1 (X( i+1 ) A - X iA ) and Z iA = A -1 (.X( i+1 ) A - X iA ) 2 . Then, 

E(Y iA \X iA ) = ii{X iA ) + O(A), and E{Z iA \X iA ) = a 2 {X lA ) + 0(A). 

Thus, fi(-) and cr 2 (-) can be approximately regarded as the regression functions of Yi A and Zi A 
on Xi A , respectively. Stanton (1997) applies kernel regression (Wand and Jones, 1995; Simonoff, 
1996) to estimate the return and volatility functions. Let K(-) be a kernel function and h be a 
bandwidth. Stanton's estimators are given by 

Sr=0^A^(^A-g) , .2, x _ E^Q Z lA K h (X iA - X) 



A(aO= ' _! , — , and a(x)- x , 

where Kh{u) = h^ 1 K(u/h) is a rescaled kernel. The consistency and asymptotic normality of the 
estimator are studied in Bandi and Phillips (1998). Independently, Fan and Yao (1998) apply the 
local linear technique (§6.3, Fan and Yao 2003) to estimate the return and volatility functions, 
under a slightly different setup. The local linear estimator (Fan, 1992) is given by 

n-l 

fi(x) = K n (X lA - x, x)Y iA , (22) 

where 

Kn{u ' x) = ^ (u) 5 n , 2 ^!o^- n £l) 2 (23) 
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with S n j(x) = J2?=o Kh(Xi& — x){Xi A — %y is the equivalent kernel induced by the local linear 
fit. In contrast with the kernel method, the local linear weights depend on both X{ and x. In 
particular, they satisfy that 

n— 1 n—l 

^2 K n (X iA - x, x) = 1, and ^ K n (X iA - x, x)(X iA - x) = 0. 

8=1 1=1 

These are the key properties for the bias reduction of the local linear method as demonstrated in 
Fan (1992). Further, Fan and Yao (1998) use the squared residuals 

A~ (-£(i+l)A - - /x(XjA)A) 2 

rather than Z{ A to estimate the volatility function. This would reduce further the approximation 
errors in the volatility estimation. They show further that the conditional variance function can be 
estimated as well as if the conditional mean function is known in advance. 

Table 1 : Variance inflation factors by using higher order differences 









Order k 






1 


2 


3 4 


5 


Vi{k) 
V 2 {k) 


1.00 
1.00 


2.50 
3.00 


4.83 9.25 
8.00 21.66 


18.95 
61.50 



Stanton (1997) derives higher order approximation scheme up to order 3 in an effort to reduce 
biases. He suggested that higher order approximations must outperform lower order approxima- 
tions. To verify such a claim, Fan and Zhang (2003) derive the following order k approximation 
scheme: 

E(Y* A \X iA ) = u.(X iA ) + 0(A fc ), and E{Z* A \X iA ) = a 2 (X lA ) + 0(A fc ), (24) 

where 

k k 

Y* A = A -1 ^ a-k,j{X( i+ j) A - X iA } and Z* A = A -1 ^ afcj{^(i+i)A - X iA } 2 
j=i i=i 

and the coefficients a^j = (— 1) J+1 (j)/j are chosen to make the approximation error in (|24[) of 

order A k . For example, the second approximation is 

1.5(X t+A - X t ) - 0.5(X t+2 A - ^*+a)- 

By using the independent increment of the Brownian motion, its variance is 1.5 2 + 0.5 2 = 2.5 times 
as large as that of the first order difference. Indeed, Fan and Zhang (2003) show that while higher 
order approximations give better approximation errors, we have to pay a huge premium for variance 
inflation: 

var(Y/ A |X lA ) = £ t 2 (Y jA )Yi(A ; )A- 1 {1 + 0(A)}, 
var(Z* A \X lA ) = 2a 4 (X iA )V 2 (k){l + 0(A)}, 
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where the variance inflation factors V\{k) and V-2,(k) are explicitly given by Fan and Zhang (2003). 
Table 1 depicts some of the numerical results for the variance inflation factor. 

The above theoretical results have also been verified via empirical simulations in Fan and Zhang 
(2003). The problem is no monopoly for nonparametric fitting — it is shared by the parametric 
methods. Therefore, the methods based on higher order differences should seldomly be used unless 
the sampling interval is very wide (e.g. quarterly data). It remains open whether it is possible to 
estimate nonparametrically the return and the volatility functions, without seriously inflating the 
variance, with other higher approximation schemes. 



First order difference 




5 10 15 

rate 

Second order difference 




5 10 15 

rate 



Figure 5: Nonparametric estimates of volatility based on orders 1 and 2 differences. The bars 
represent two standard deviations above and below the estimated volatility. Top panel: order 1 fit. 
Botton panel: order 2 fit. 

As an illustration, we take the yields of the two-year Treasury notes depicted in Figure 1. 
Figure 5 presents nonparametrically estimated volatility function based orders k = 1 and k = 2 
approximations. The local linear fit is employed with the Epanechnikov kernel and bandwidth 
h = 0.35. It is evident that the order 2 approximation has higher variance than the order 1 
approximation. In fact, the magnitude of variance inflation is in line with the theoretical result: 
the increase of the standard deviation from order 1 to order 2 approximation is y/3. 
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Stanton (1997) applies his kernel estimator to a Treasury's bill data set and observes nonlinear 
return function in his nonpar ametric estimate, particularly in the region where the interest rate 
is high (over 14%, say). This leads him to postulate the hypothesis that the return functions of 
short-term rates are nonlinear. Chapman and Pearson (2000) study the finite sample properties 
of Stanton's estimator. By applying his procedure to the CIR model, they find that the Stanton's 
procedure produces spurious nonlinearity, due to the boundary effect and the mean reversion. 

Can we employ a formal statistic test to the Stanton's hypothesis? The null hypothesis can 
simply be formulated: the drift is of a linear form as in model ©• What is the alternative 
hypothesis? For such a kind of problem, our alternative model is usually vague. Hence, it is natural 
to assume that the drift is a nonlinear smooth function. This becomes a testing problem with a 
parametric null hypothesis versus a nonparametric alternative hypothesis. There is a large literature 
on this. The basic idea is to compute a discrepancy measure between the parametric estimates and 
nonparametric estimates and to reject the parametric hypothesis when the discrepancy is large. See 
for example the book by Hart (1997). In an effort to derive a generally applicable principle, Fan et 
ai.(2001) propose the generalized likelihood ratio (GLR) tests for parametric versus nonparametric 
or nonparametric versus nonparametric hypotheses. The basic idea is to replace the maximum 
likelihood under nonparametric hypotheses (usually does not exist) by the likelihood under good 
nonparametric estimates. The method has been successfully employed by Fan and Zhang (2003) 
for checking whether the return and volatility functions possess certain parametric forms. 

Various discretization schemes and estimation methods have been proposed for the case with 
high frequency data over a long time horizon. More precisely, the studies are under the assump- 
tions that A n — ► and nA n — > oo. See for example, Dacunha-Castelle and Florens (1986), Yoshida 
(1992), Kessler (1997), Arfi (1998), Gobet (2003), Cai and Hong (2003) and references therein. 
Arapis and Gao (2003) investigate the mean integrated square errors of several methods for esti- 
mating the drift and diffusion and compare their performance. Ait-Sahalia and Mykland (2003, 
2004) study the effects of random and discrete sampling when estimating continuous-time diffu- 
sions. Bandi and Nguyen (1999) investigate small sample behaviors of nonparametric diffusion 
estimators. Thorough study of nonparametric estimation of conditional variance functions can be 
found in Miiller and Stadtmuller (1987), Hall and Carroll (1989), Ruppert et al.(1997) and Hardle 
and Tsybakov (1997). In particular, §8.7 of Fan and Yao (2003) give various methods for estimating 
the conditional variance function. 

3.3 Fixed sampling interval 

For practical analysis of financial data, it is hard to determine whether the sampling interval tends 
to zero. The key determination is whether the approximation errors for small "A" are negligible. It 
is ideal when a method is applicable whether or not "A" is small. This kind of method is possible, 
as demonstrated below. 

The simplest problem to illustrate the idea is the kernel density estimation of the invariant 
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density of the stationary process {A^}. For the given sample {XtA}, the kernel density estimate 
for the invariant density is 

n 

f(x)=n- 1 Y / K h (X iA -x), (25) 
i=i 

based on the discrete data {Xi^, i = 1, • • • , n}. This method is valid for all A. It gives a consistent 
estimate of / as long as the time horizon is long: nA — > oo. We will refer to this kind of nonpara- 
metric methods as the state-domain smoothing, as the procedure localizes in the state variable Xt. 
Various properties, including consistency and asymptotic normality, of the kernel estimator (|25|). 
are studied by Bandi (1998), Bandi and Phillips (1998). Bandi (1998) also used the estimator (j2SJ), 
which is the same as the local time of the process spending at a point x except a scaling constant, 
as a descriptive tool for potentially nonstationary diffusion processes. 

Why can the state-domain smoothing methods be employed as if the data were independent? 
This is due to the fact that localizing in state-domain weakens the correlation structure and that 
nonparametric estimates use essentially only local data. Hence many results on nonparametric 
estimators for independent data continue to hold for dependent data, as long as their mixing 
coefficients decay sufficiently fast. As mentioned at the end of §2.2, the geometric mixing and 
mixing are equivalent for time-homogeneous diffusion process. Hence, the mixing coefficients decay 
sufficiently fast for theoretical investigation. 

The localizing and whitening can be understood graphically in Figure 6. Figure 6(a) shows 
that there is very strong serial correlation of the yields of the two-year treasury notes. However, 
this correlation is significantly weakened for the local data in the neighborhood of 8% ± 0.2%. In 
fact, as detailed in Figure 6(b), the indices of the data that fall in the local window are quite far 
apart. This in turn implies the week dependence for the data in the local window, i.e. "whitening 
by windowing". See §5.4 of Fan and Yao (2003) and Hart (1996) for further details. The effect of 
dependence structure on the kernel density estimation was thoroughly studied by Claeskens and 
Hall (2002). 

The diffusion function can also be consistently estimated when A is fixed. In pricing the 
derivatives of interest rates, Ait-Sahalia (1996a) assumes fi(x) = k(a — x). Using the kernel density 
estimator / and estimated k and a from a least-squares method, he applied (|llj) to estimate 
<t(-): a 2 (x) = 2 Jq fi(u)f(u)du/ f(x). He further established the asymptotic normality of such an 
estimator. Gao and King (2004) propose tests of diffusion models based on the discrepancy between 
the parametric and nonparametric estimates of the invariant density. 

The Ait-Sahalia (1996a) method is a simple one to illustrate that the volatility function can be 
consistently estimated for fixed A. However, we do not expect that it is efficient. Indeed, we use 
only the marginal information of the data. As shown in (|20|) . almost all information is contained in 
the transition density pa("| - )- The transition density can be estimated as in §4.2 below, no matter A 
is small or large. Since the transition density and drift and volatility are one-to-one correspondence 
for the diffusion process (jSJ). Hence, the drift and diffusion functions can be consistently estimated 
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Figure 6: (a) Lag 1 scatterplot of the two-year Treasury note data, (b) Lag 1 scatterplot of those 
data falling in the neighborhood 8% ±0.2% — the points are represented by the time of the observed 
data. The number in the scatterplot shows the indices of the data falling in the neighborhood, (c) 
Kernel density estimate of the invariant density. 

via inverting the relationship between the transition density and the drift and diffusion functions. 

There is no simple formula for expressing the drift and diffusion in terms of the transition density. 
The inversion is frequently carried out via a spectral analysis of the operator = exp(AL), where 
the infinitesimal operator L is defined as 

L d{ x ) = ^Y^-g"{x) +(i(x)g'(x). 

It has the property: 

Lg(x) = JmA- 1 ^^)!^ = x} - g(x)], 
by Ito's formula (jlOj) . The operator is the transition operator in that [see also p2jl ] 

H A g(x)=E{g(X A )\X =x}. 

The works of Hansen and Scheinkman (1995), Hansen et al.(1998) and Kessler and S0rensen (1999) 
consist of the following idea. The first step is to estimate the transition operator H A from the 
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data. From the transition operator, one can identify the infinitesimal operator L and hence the 
functions fi(-) and cr(-). More precisely, let Ai be the largest negative eigenvalue of the operator L 
with eigen function Then, L£i = Ai£i, or equivalently a 2 ^'{ + 2/x£j = 2Ai£i. This gives one 

equation of fi and a. Another equation can be obtained via (fTTj): (o- 2 /)' - = 0. Solving these 
two equations we obtain 



and another explicit expression for fi(x). Using the semigroup theory (Theorem IV. 3. 7, Engel and 
Nagel 2000), £i is also an eigen function of with eigenvalue exp(AAi). Hence, the proposal is 
to estimate the invariant density / and the transition density p/\(y\x), which implies the value of 
Ai and £i. Gobel et ai.(2002) derive the optimal rate of convergence for such a scheme, using a 
wavelet basis. In particular, they show that for fixed A, the optimal rates of convergence for \x and 
a are of orders 0(n~ s ^ 2s+5 ^) and 0(n~ s /( 2s+3 )), respectively, where s is the degree of smoothness 
of fj, and a. 

3.4 Time-dependent model 

The time dependent model (JHJ) was introduced to accommodate the possibility of economic changes 
over time. The coefficient functions in (|SJ) are assumed to be slow time-varying and smooth. 
Nonparametric techniques can be applied to estimate these coefficient functions. The basic idea is 
to localizing in time, resulting in a time-domain smoothing. 

We first estimate the coefficient functions ao(^) an d ai(t). For each given time to, approximate 
the coefficient functions locally by constants: a(t) ~ a and (3{t) = b for t in a neighborhood of to- 
Using the Euler approximation (|18|) . we run a local regression: Minimize 



with respect to a and b. This results in an estimate ao(to) = o* an d &i(to) = b, where a and b are 
the minimizer of the local regression (|26|) . Fan et aJ.(2003) suggest using a one-sided kernel such 
as K{u) = (1 — u 2 )I{— 1 < u < 0) so that only the historical data in the time interval (to — h, to) 
are used in the above local regression. This facilitates forecasting and bandwidth selection. Our 
experience shows that there are no significant differences between nonparametric fitting with one- 
sided and two-sided kernels. We opt for local constant approximations instead of local linear 
approximations for estimating time varying functions, since the local linear fit can create artificial 
albeit insignificant linear trends when the underlying functions ao(t) and a\(t) are indeed time- 
independent. To appreciate this, for constant functions a\ and ai, a large bandwidth will be 
chosen to reduce the variance in the estimation. This is in essence to fit a global linear regression 
for (|26|). If the local linear approximations are used, since no variable selection procedures have 
been incorporated in the local fitting 1|26|). the slopes of the local linear approximations will not be 
estimated as zero and hence artificial linear trends will be created for the estimated coefficients. 




n-l 



a-bX iA ) 2 K h (iA-t ) 



(26) 
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The coefficient functions in the volatility can be estimated by the local approximated likelihood 
method. Let 

% = A-^iXt+A -X t - (a (t) + a x (t)X t )A} 
be the normalized residuals. Then, 

Et « foWX^et. (27) 

The conditional log-likelihood of Et given Xt can easily be obtained by the approximation (|27|) . 
Using local constant approximations and incorporating the kernel weight, we obtain the local 
approximated likelihood at each time point and an estimate of the functions /3o(0 an d fti(-) at that 
time point. This type of the local approximated-likelihood method is related to the generalized 
method of moments of Hansen (1982), and the ideas of Florens-Zmirou (1993) and Genon-Catalot 
and Jacod (1993). 

Since the coefficient functions in both return and volatility functions are estimated using only 
historical data, their bandwidths can be selected based on a form of the average prediction error. 
See Fan et al.(2003) for details. The local least-squares regression can also be applied to estimate 
the coefficient functions (3o(t) and (3\{t) via the transformed model [see l|27jl] 

log(^ 2 ) « 2 log /3o(i) + fr(t) logpQ 2 ) + log(e 2 ), 

but we do not pursue along this direction, since the local least-squares estimate is known to be 
inefficient in the likelihood context and the exponentiation of an estimated coefficient function of 
log/3o(t) is unstable. 

A question arises naturally if the coefficients in model (jHJ) are really time- varying. This amounts 
for example to testing Hq : Po(t) = Po and 0i(t) = (3\. Based on the GLR technique, Fan et ai.(2003) 
proposed a formal test for this kind of problems. 

The coefficient functions in the semiparametric model © can also be estimated by using the 
profile approximated-likelihood method. For each given fix, one can estimate easily /?o( - ) via the 
approximation (|27[). resulting in an estimate (3q(-;Pi). Regarding the nonpar ametric function (5q(-) 
as being parameterized by Pq(-',Pi), model (|27j) with [5\{t) = f3\ becomes a "synthesized" parametric 
model with unknown (5\. The parameter (3\ can be estimated by the maximum (approximated) 
likelihood method. Note that f3\ is estimated by using all the data points, while $o(t) = f3o(t; (3\) 
is obtained by using only the local data points. See Fan et ai.(2003) for details. 

For other nonparametric methods of estimating volatility in time inhomogeneous models, see 
Hardle et aJ.(2003) and Mercurio and Spokoiny (2003). Their methods are based on model © with 
ai(t) =/3i(t) = 0. 

3.5 State-domain versus Time-domain smoothing 

So far, we have introduced both state- and time-domain smoothing. The former relies on the 
structural invariability implied by the stationarity assumption and uses pre-dominantly on the 
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(remote) historical data. The latter uses the continuity of underlying parameters and concentrates 
basically on the recent data. This can be illustrated in Figure 7, using the yields of the 3-month 
Treasury bills from January 8, 1954 to July 16, 2004, sampled at weekly frequency. On December 
28, 1990, the interest rate is about 6.48%. To estimate the drift and diffusion around x = 6.48, the 
state-domain focuses on the dynamics where interest rates are around 6.48%, the horizontal bar 
with interest rates falling in 6.48% ± .25%. The estimated volatility is basically the sample standard 
deviation of the differences {Xi^ — X^_i^} within this horizontal bar. On the other hand, the 
time-domain smoothing focuses predominately on the recent history, say one year, as illustrated in 
the figure. The time-domain estimate of volatility is basically a sample standard deviation within 
the vertical bar. 

Yields of 3-months Treasury Bills from 1 954 to 2004 





K.Al 







500 1000 1500 2000 



Figure 7: Illustration of the time and state-domain smoothing using the yields of 3-month Trea- 
sury bills. The state-domain smoothing localizing in the horizontal bars, while the time-domain 
smoothing concentrating in the vertical bars. 

For a given time series, it is hard to say which estimate is better. This depends on the underlying 
stochastic processes and also on the time when the forecast to be made. If the underlying process is 
continuous and stationary such as model (JSJ), both methods are applicable. For example, standing 
on December 28, 1990, one can forecast the volatility by using the sample standard deviation in 
either the horizontal bar or vertical bar. However, the estimated precision depends on the local 
data. Since the sample variance is basically linear in the squared differences, the standard errors 
of both estimates can be assessed and used to guide the forecasting. 

For stationary diffusion processes, it is possible to integrate both the time-domain and state- 
domain estimates. Note that the historical data (with interest rates in 6.48%±.25%) are far apart in 
time (except the last piece, which can be ignored in the state-domain fitting) from the data used in 
the time-domain smoothing (vertical bar). Hence, these two estimates are nearly independent. The 
integrated estimate is a linear combination of these two nearly independent estimates. The weights 
can easily be chosen to minimize the variance of the integrated estimator, by using the assessed 
standard errors of the state- and time-domain estimators. This forms a dynamically integrated 
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predictor for volatility estimation, as the optimal weights change over time. 
3.6 Continuously observed data 

At theoretical level, one may also examine the problem of estimating the drift and diffusion functions 
assuming the whole process is observable up to time T. Let us assume again that the observed 
process {Xt} follows SDE (|5|). In this case, a 2 (Xt) is the derivative of the quadratic variation 
process of Xt and hence is known up to time T. By estimating the drift function /u(x) is 

equivalent to estimating the invariant density /. In fact, 

= [a 2 (x)f(x)]'/[2f(x)]. (28) 

The invariant density / can easily be estimated by the kernel density estimation. When A —* 0, 
the summation in (|25|) converges to 

f{x) =T- 1 F K h (X t -x)dt. (29) 
Jo 

This forms a kernel density estimate of the invariant density based on the continuously observed 
data. Thus, an estimator for fj,(x) can be obtained by substituting f(x) into (|28|). Such an ap- 
proach has been employed by Kutoyants (1998) and Dalalyan and Kutoyants (2000, 2003). They 
established sharp asymptotic minimax risk for estimating the invariant density / and its derivative, 
as well as the drift function fj,. In particular, the functions /, /' and \i can be estimated with rate 
ji-l/2^ rp-2s/(2s+i) an( j rp-2s/(2s+i) ^ respectively, where s is the degree of smoothness of fj,. These 
are the optimal rates of convergence. 

An alternative approach is to estimate the drift function directly from (|22j) . By letting A —* 0, 
one can easily obtain a local linear regression estimator for continuously observed data, which 
admits a similar form to (|22|) and ()29|) . This is the approach that Spokoiny (2000) used. He showed 
that this estimator attains the optimal rate of convergence and established further a data-driven 
bandwidth such that the local linear estimator attains adaptive minimax rates. 

4 Estimation of state price densities and transition densities 

State-price density (SPD) is the probability density of the value of an asset under the risk-neutral 
world (|14|) [see Cox and Ross (1976)] or equivalent martingale measure (Harrison and Kreps, 1979). 
It is directly related to the pricing of financial derivatives. It is the transition density of Xt given 
Xq under the equivalent martingale Q. The SPD does not depend on the payoff function and hence 
it can be used to evaluate other illiquid derivatives, once it is estimated from more liquid derivatives. 
On the other hand, the transition density characterizes the probability law of a Markovian process 
and hence is useful for validating Markovian properties and parametric models. 
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4.1 Estimation of state price density 

For some specific models, the state price density can be formed explicitly. For example, for the 
GBM Q with a constant risk- free rate r, according to (|16[). the SPD is log- normal, with mean 
logxo + (r — a 2 )/(2T) and variance a 2 . 

Assume that the SPD /* exists. Then, the European call option can be expressed as 

C = exp(— / r s ds) / (x — K)f*(x)dx. 
Jo Jk 

See ()14|) (we have changed the notation from Pq to C to emphasize the price of the European call 
option). Hence, 

f*(K)=eM^r s ds)^. (30) 

This was observed by Breeden and Litzenberger (1978). Thus, the state price density can be 
estimated from the European call options with different strike prices. With the estimated state 
price density, one can price new or less-liquid securities such as over the counter derivatives or 
nontraded options, using formula (|14j) . 

In general, the price of an European call option depends on the current stock price S, the strike 
price K, the time to maturity T, the risk-free interest rate r and dividend yield rate 5. It can be 
written as C(S, K, T, r, 5). The exact form of C, in general, is hard to determine, unless we assume 
the Black-Scholes model. Based on historical data {(Cj, Si,Ki,Ti,ri,Si),i = l,---,n}, where Cj 
is the i th traded-option price with associated characteristics (Si, Ki,Ti,ri,5i), Ait-Sahalia and Lo 
(1998) fit the following nonparametric regression 

C-i = C(Si, Ki, Ti, Ti, 5{) + Si 

to obtain an estimate of the function C and hence the SPD /*. 

Due to the curse of dimensionality, the five dimensional nonparametric function can not be 
estimated well with practical range of sample sizes. Ait-Sahalia and Lo (1998) realized that and 
proposed a few dimensionality reduction methods. First, by assuming that the option price depends 
only on the futures price F = S'exp((r — 5)T), namely, 

C(S,K,T,r,5)=C(F,K,T,r) 

(the Black-Scholes formula satisfies such an assumption), they reduced the dimensionality from 5 
to 4. By assuming further that the option-pricing function is homogeneous of degree one in F and 
K, namely, 

C(S,K,T,r,5) =KC(F/K,T,r), 

they reduced the dimensionality to 3. Ait-Sahalia and Lo (1998) imposed a semiparametric form 
on the pricing formula: 

C(S, K, T, r, 5) = C BS (F, K, T, r, a(F, K, T)), 
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where Cbs(-^ K, T, r, a) is the Black-Scholes pricing formula given in l|17[l and a(F, K,T) is the 
implied volatility, computed by inverting the Black-Scholes formula. Thus, the problem becomes 
nonparametrically estimating the implied volatility function a(F, K, T). This is estimated by using 
a nonparametric regression technique from historical data, namely 

Oi = a(Fi,Ki,Ti) +£ h 

where cij is the implied volatility of Cj by inverting the Black-Scholes formula. By assuming further 
that cr(F,K,T) = cr(F/K,T), the dimensionality is reduced to 2. This is one of the options in 
Ait-Sahalia and Lo (1998). 

The state price density /* is non-negative and hence the function C should be convex in the 
strike price K. Ait-Sahalia and Duarte (2003) propose to estimate the option price under the 
convexity constraint, using a local linear estimator. See also Hardle and Yatchew (2002) for a 
related approach. 

4.2 Estimation of transition densities 

The transition density of a Markov process characterizes the law of the process, except the initial 
distribution. It provides useful tools for checking whether or not such a process follows a certain 
SDE and for statistical estimation and inferences. It is the state price density of the price process 
under the risk neutral world. If such a process were observable, the state price density can be 
estimated using the methods to be introduced. 

Assume that we have a sample {X{ A ,i = 0, • • • , n} from model (jSJ). The "double-kernel" method 
of Fan et aJ.(1996) is to observe that 

E{W h2 {X iA - y)\X {i _ 1)A = x} m p A (y\x), as h 2 -> 0, (31) 

for a kernel function W. Thus, the transition density p A (y\x) can be regarded approximately 
as the nonparametric regression function of the response variable Wh 2 (Xi A — y) on X^u A . An 
application of the local linear estimator (|22|) yields 

n 

PA(y\x) = ^K n {X {i _ l)A - x,x)W h2 (X iA -y), (32) 
i=l 

where the equivalent kernel K n (u,x) was defined in H23|) . Fan et aJ.(1996) establish the asymptotic 
normality of such an estimator under stationarity and p-mixing conditions [necessarily decaying 
at geometric rate for SDE ©], which gives explicitly the asymptotic bias and variance of the 
estimator. See also §6.5 of Fan and Yao (2003). The cross-validation idea of Rudemo (1982) and 
Bowman (1984) can be extended to select bandwidths for estimating conditional densities. See Fan 
and Yim (2004) and Hall et aJ.(2004). 

The transition distribution can be estimated by integrating the estimator (j32j) over y. Alter- 
native estimators can be obtained by an application of the local logistic regression and adjusted 
Nadaraya- Watson method of Hall et ai.(1999). 
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Early references on the estimation of the transition distributions and densities include Roussas 
(1967, 1969) and Rosenblatt (1970). 

4.3 Inferences based on transition densities 

With the estimated transition density, one can now verify whether parametric models such as ©- 
©, © are consistent with the observed data. Let PA,e(y\ x ) be the transition density under a 
parametric diffusion model. For example, for the CIR model (JJJ), the parameter 9 = (k, a, a). As 
in 1)20(1 . ignoring the initial value Xq, the parameter 9 can be estimated by maximizing 

n 

t(PA,e) = ^2^ogp Afi (X iA \X ii _ 1]A ). 
i=i 

Let 9 be the maximum likelihood estimator. By the spirit of the GLR of Fan et al.(2001), the GLR 
test for the null hypothesis Hq : p A {y\x) = PA,e{v\x) is 

GLR = £(p A )-£(p A§ ), 

where p is a nonparametric estimate of the transition density. Since the transition density can not 
be estimated well over the region where data are sparse (usually at boundaries of the process), we 
need to truncate the nonparametric (and simultaneously parametric) evaluation of the likelihood 
at appropriate intervals. 

In addition to employing the GLR test, one can also compare directly the difference between the 
parametric and nonparametric fits, resulting in test statistics such as \\pa—P a §\\ 2 and ||Pa — P\ §\\ 2 
for an appropriate norm || • ||, where Pa and Pa x are the estimates of the cumulative transition 
distributions under respectively the parametric and nonparametric model. An alternative method 
is to apply the GLR of Fan et al.(2001) to separately test the forms of the drift and diffusion, 
as in Fan and Zhang (2003). The transition density approach appears more elegant as it checks 
simultaneously the forms of drift and diffusion, but more computationally intensive. In comparisons 
with the invariant density-based approach of Arapis and Gao (2003), it is consistent against a much 
larger family of alternatives. 

One can also use the transition density to test whether an observed series is Markovian (from 
personal communication with Yacine Ait-Sahalia). For example, if a process {A^a} is Markovian, 
then 

/+oo 
PA{y\z)pA(z\x)dz. 
-oo 

Thus, one can use the distance between P2a(v\x) and pA(y\z)PA(z\x)dz as a test statistic. 

Transition density can also be used for parameter estimation. One possible approach is to find 
the parameter to minimize the distance ||Pa — ell- I n this case, the bandwidth should be chosen 
to optimize the performance for estimating 9. 
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5 Concluding remark 



Enormous efforts in financiai econometrics have been made in modeling the dynamics of stock prices 
and bond yields. There are directly related to pricing derivative securities, proprietary trading and 
portfolio management. Various parametric models have been proposed to facilitate mathematical 
derivations. They have risks that misspecifications of models lead to erroneous pricing and hedging 
strategies. Nonparametric models provide a powerful and flexible treatment. They aim at reducing 
modeling biases by increasing somewhat the estimation variances. They provide an elegant method 
for validating or suggesting a family of parametric models. 

The versatility of nonparametric techniques in financial econometrics has been demonstrated in 
this paper. They are applicable to various aspects of diffusion models: drift, diffusion, transition 
densities, and even state price densities. They allow us to examine whether the stochastic dynamics 
for stocks and bonds are time varying and whether famous parametric models are consistent with 
empirical financial data. They permit us to price illiquid or non-traded derivatives from liquid 
derivatives. 

The applications of nonparametric techniques in financial econometrics are far wider than what 
has been presented. There are several areas where nonparametric methods have played a pivotal 
role. One example is to test various versions of capital asset pricing models (CAPM) and their 
related stochastic discount models (Cochrane, 2001). See for example the research manuscript by 
Chen and Ludvigson (2003) in this direction. Another important class of models are stochastic 
volatility models (Barndoff-Neilsen and Shephard, 2001 and Shephard 2004), where nonparametric 
methods can be applied. The nonparametric techniques have been prominently featured in the 
RiskMetrics of J. P. Morgan. It can be employed to forecast the risks of portfolios. See, for 
example, Jorion (2000), Ai-Sahalia and Lo (2000), Chen and Tang (2003), Fan and Gu (2003) and 
Chen (2004). 
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